Skip to content

feat(observability): measure response tokens and cache status#1063

Merged
shaun0927 merged 12 commits into
developfrom
feat/996-token-observability
May 13, 2026
Merged

feat(observability): measure response tokens and cache status#1063
shaun0927 merged 12 commits into
developfrom
feat/996-token-observability

Conversation

@shaun0927
Copy link
Copy Markdown
Owner

Summary

  • adds bounded-cardinality output-size and estimated-token metrics for final MCP tool responses
  • records compression savings and cache status metrics when tools expose those fields
  • adds metric registration tests for the new observability surface

Closes #996.

Alignment / scope review

This stays aligned with OpenChrome's direction: no server-side LLM, no tokenizer dependency, no billing integration, and no persisted response bodies. It measures only payload sizes/status after response shaping and redaction.

Verification

  • /Users/jh0927/openchrome/node_modules/.bin/tsc -p tsconfig.json --pretty false
  • npx jest --config jest.config.js --runInBand tests/metrics/tool-output-observability.test.ts tests/metrics/cookie-scan-metrics.test.ts

Post-merge smoke

Run the issue's OpenChrome MCP verification script/manual flow against /metrics to confirm openchrome_tool_output_bytes, openchrome_tool_estimated_tokens, and cache-status counters appear with bounded labels.

@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d20405b6f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mcp-server.ts Outdated
const metrics = getMetricsCollector();
const bytes = estimateResultPayloadBytes(result);
metrics.observe('openchrome_tool_output_bytes', withTenantLabel({ tool: toolName }), bytes);
metrics.observe('openchrome_tool_estimated_tokens', withTenantLabel({ tool: toolName }), estimateOutputTokensFromChars(bytes));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Compute token estimate from character count

recordToolOutputObservability passes the UTF-8 byte length into estimateOutputTokensFromChars, but that heuristic is documented and named as chars/4. For non-ASCII responses (e.g., CJK or emoji-heavy outputs), byte length is larger than character count, so openchrome_tool_estimated_tokens is systematically inflated and no longer comparable across languages.

Useful? React with 👍 / 👎.

Comment thread src/mcp-server.ts Outdated
const status = typeof obj.status === 'string' ? obj.status : typeof obj.cacheStatus === 'string' ? obj.cacheStatus : null;
if (!status) return null;
const keyVersion = obj.keyVersion ?? obj.version ?? 'unknown';
return { status: status.toUpperCase(), keyVersion: String(keyVersion) };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize cache key version metric label

extractCacheStatus forwards keyVersion as an arbitrary string, and that value is later emitted as the key_version label in openchrome_cache_status_total without any bounding/normalization. If a tool reports per-request versions (hashes/timestamps), this introduces unbounded label cardinality and can degrade Prometheus memory/query performance.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6abe02838

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mcp-server.ts
if (cache) {
metrics.inc('openchrome_cache_status_total', withTenantLabel({
tool: toolName,
status: cache.status,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Constrain cache status labels before emitting metrics

recordToolOutputObservability emits openchrome_cache_status_total with status taken directly from tool output via extractCacheStatus (only uppercased), so any tool that includes request-specific detail in cache status (for example reason strings or embedded IDs) will create unbounded label cardinality and degrade Prometheus memory/query performance. Please normalize this to a small allowlist (e.g., HIT/MISS/BYPASS/ERROR) or map unknown values to a fixed bucket before incrementing the counter.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d8bb37afb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mcp-server.ts
if (raw && typeof raw === 'object') {
const obj = raw as Record<string, unknown>;
const status = typeof obj.status === 'string' ? obj.status : typeof obj.cacheStatus === 'string' ? obj.cacheStatus : null;
if (!status) return null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Continue cache-status fallback instead of returning early

When raw is an object but does not contain a string status/cacheStatus, this early return null prevents the later structuredContent.cacheStatus fallback from running. In mixed-shape results (for example, cache: { keyVersion: "v2" } plus structuredContent.cacheStatus: "miss"), openchrome_cache_status_total is silently dropped even though a valid cache status is present.

Useful? React with 👍 / 👎.

shaun0927 and others added 2 commits May 13, 2026 01:40
Constraint: OpenChrome must expose token/output savings without adding tokenizer or billing dependencies.
Rejected: Provider-specific tokenizers | they add dependency and billing semantics outside this MCP server boundary.
Confidence: medium
Scope-risk: narrow
Directive: Keep metric labels bounded; never add URL, selector, instruction, or domain labels for these metrics.
Tested: /Users/jh0927/openchrome/node_modules/.bin/tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/metrics/tool-output-observability.test.ts tests/metrics/cookie-scan-metrics.test.ts
Not-tested: Live /metrics smoke against a running OpenChrome instance
@shaun0927 shaun0927 force-pushed the feat/996-token-observability branch from 5d8bb37 to 94f4324 Compare May 12, 2026 16:40
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94f4324d1b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mcp-server.ts
if (typeof originalChars === 'number' && typeof compressedChars === 'number' && originalChars > compressedChars) {
metrics.observe(
'openchrome_tool_compression_saved_bytes',
withTenantLabel({ tool: toolName, mode: level }),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize compression mode labels before emitting metrics

recordToolOutputObservability forwards _compression.level directly into the mode label for openchrome_tool_compression_saved_bytes. Because MCPResult is free-form, any tool can return request-specific strings in _compression.level, which creates unbounded label cardinality and can degrade Prometheus memory/query performance once compression stats are emitted. Please map this label to a small allowlist (and bucket unknowns) before calling observe.

Useful? React with 👍 / 👎.

@shaun0927
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94f4324d1b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mcp-server.ts
],
isError: true,
};
this.recordToolOutputObservability(toolName, forbiddenResult);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Bound tool labels before recording denied-call output metrics

recordToolOutputObservability is called on the scope-denied path before verifying that toolName is a registered tool, so a read-only API key can send arbitrary params.name values and create unbounded new time series in openchrome_tool_output_bytes / openchrome_tool_estimated_tokens (and potentially cache-status metrics). This introduces a cardinality-amplification vector from user input and can degrade Prometheus memory/query performance; normalize unknown names to a fixed bucket (e.g. unknown) or only emit these metrics after successful tool lookup.

Useful? React with 👍 / 👎.

Merges the latest develop test-infra repairs so PR #1063 is evaluated against the branch it targets without changing the bounded metrics feature itself.

Constraint: Remote CI failed on stale develop test expectations unrelated to token observability.

Rejected: Editing unrelated doctor/tab tests in this PR | the fixes already landed on develop and should be consumed as base updates.

Confidence: medium

Scope-risk: narrow

Directive: Keep output observability labels bounded; do not add URL, selector, instruction, or domain labels.

Tested: pending rerun after dependency install; remote CI required for full matrix.

Not-tested: Full CI matrix locally.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5f9adab7f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mcp-server.ts
Comment on lines +1855 to +1858
const originalChars = (compression as Record<string, unknown>).originalChars;
const compressedChars = (compression as Record<string, unknown>).compressedChars;
const level = String((compression as Record<string, unknown>).level ?? 'unknown');
if (typeof originalChars === 'number' && typeof compressedChars === 'number' && originalChars > compressedChars) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve compression stats before observing saved bytes

recordToolOutputObservability expects _compression.originalChars and _compression.compressedChars, but the success path currently rewrites _compression to { level, verbosity } before this method runs when compression.trackSavings is enabled. In that configuration, any tool-provided compression stats are discarded, so openchrome_tool_compression_saved_bytes never records even when a tool returns valid savings data. Please merge/retain existing _compression fields instead of replacing the object.

Useful? React with 👍 / 👎.

Records original and delta response sizes on real read_page DOM delta responses so compression savings metrics are populated from actual tool output instead of inert placeholder metadata.

Constraint: PR #1063 promised compression savings observability without adding tokenizer or billing dependencies.

Rejected: Provider-specific token accounting | this remains a bounded payload-size heuristic within the MCP server boundary.

Confidence: high

Scope-risk: narrow

Directive: Keep output metric labels bounded and never include URL, selector, instruction, or domain labels.

Tested: npm test -- --runInBand tests/metrics/tool-output-observability.test.ts tests/metrics/cookie-scan-metrics.test.ts; ./node_modules/.bin/tsc -p tsconfig.json --pretty false

Not-tested: Live /metrics smoke against a running OpenChrome instance.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d45a2def9a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/read-page.ts
Comment on lines +551 to +552
originalChars: outputText.length,
compressedChars: compressedText.length,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count full uncompressed response when reporting delta savings

_compression.originalChars is set from outputText.length, but the non-delta DOM response for the same call is outputText + domPaginationSection. When includePagination is left at its default (true) and pagination metadata is non-empty, this undercounts the baseline and can make originalChars <= compressedChars, so openchrome_tool_compression_saved_bytes is underreported or dropped even when delta compression actually saved bytes.

Useful? React with 👍 / 👎.

shaun0927 and others added 2 commits May 13, 2026 04:18
Cleans up module mocks used by the read_page compression-savings regression test so full-suite CI workers do not inherit mocked session or DOM modules.

Constraint: CI runs many Jest files in shared workers, so test-local doMock calls must not leak beyond this suite.

Rejected: Removing the real read_page delta-path test | it is needed to prove the compression-savings metric is populated.

Confidence: high

Scope-risk: narrow

Directive: Keep observability tests hermetic when mocking runtime modules.

Tested: npm test -- --runInBand tests/metrics/tool-output-observability.test.ts tests/metrics/cookie-scan-metrics.test.ts

Not-tested: Full Jest suite locally after this test-isolation fix.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53347f80b3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mcp-server.ts
metrics.observe(
'openchrome_tool_compression_saved_bytes',
withTenantLabel({ tool: toolName, mode: level }),
originalChars - compressedChars,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Convert compression savings metric to true byte deltas

openchrome_tool_compression_saved_bytes is recorded from originalChars - compressedChars, but those _compression fields are character counts (for example read_page sets them from .length). For non-ASCII output (CJK, emoji, etc.), character deltas can differ substantially from UTF-8 byte deltas, so this metric reports incorrect units and can mislead dashboards/alerts that depend on byte savings. Compute the delta from UTF-8 byte lengths (or rename the metric to chars) before observing it.

Useful? React with 👍 / 👎.

Constraint: PR #1063 inherits the expanded Tier 1 tool surface from develop.
Rejected: Loosening the cursor verification assertion | it would stop catching accidental tool-surface drift.
Confidence: high
Scope-risk: narrow
Directive: Keep the exact count aligned whenever Tier 1 tool registration intentionally changes.
Tested: CI=true npx jest --config jest.ci.config.js --runInBand tests/cross-env/cursor-verification.test.ts
Not-tested: Full CI matrix pending on GitHub runners.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@shaun0927
Copy link
Copy Markdown
Owner Author

@codex review

Constraint: PR #1063 must pass Windows CI where Git can check out JSON fixtures with CRLF line endings.
Rejected: Changing the v1.11.0 fixture payload | the response shape did not change.
Confidence: high
Scope-risk: narrow
Directive: Preserve strict JSON shape assertions while normalizing platform line endings at comparison time.
Tested: CI=true npx jest --config jest.ci.config.js --runInBand tests/tools/console-capture-regression.test.ts tests/metrics/tool-output-observability.test.ts tests/metrics/cookie-scan-metrics.test.ts
Not-tested: Full CI matrix pending on GitHub runners.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@shaun0927
Copy link
Copy Markdown
Owner Author

@codex review

Merge current develop into the token-observability PR while preserving delta-compression metadata and the newer node_refs DOM output contract. The admin-key plaintext assertion remains resilient to unrelated worker console noise from CI.

Constraint: PR #1063 was merge-conflicting with develop and CI failed on macOS admin-key stdout noise before this review pass.
Rejected: Treating Jest console noise as CLI plaintext output | the CLI contract is about exactly one key token, not every captured stdout line.
Confidence: high
Scope-risk: moderate
Directive: Preserve machine-readable compression metadata when adding additional DOM response sections.
Tested: npm ci; npm run build; npm run lint; npm run lint:tier; npm run lint:tool-schemas; npx jest tests/cli/admin-keys.test.ts --runInBand --forceExit; git diff --check
Not-tested: Full multi-OS GitHub Actions matrix before push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@shaun0927
Copy link
Copy Markdown
Owner Author

@codex review

최신 develop 병합 및 macOS admin-key 테스트 안정화 반영 완료:

  • DOM delta compression metadata와 node_refs block 모두 보존
  • admin keys create 테스트가 stdout 전체 라인 수가 아닌 plaintext token 단일성만 검증하도록 충돌 해소

검증:

  • npm ci
  • npm run build
  • npm run lint
  • npm run lint:tier
  • npm run lint:tool-schemas
  • npx jest tests/cli/admin-keys.test.ts --runInBand --forceExit
  • git diff --check

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ed21ebaf73

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/read-page.ts
content: [{ type: 'text', text: compressedText }],
_compression: {
level: 'delta',
originalChars: outputText.length,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Compute original delta size from full DOM response

In the delta path, _compression.originalChars is derived from outputText.length, but the fallback non-delta DOM response includes nodeRefsBlock and optional pagination text as well. On pages with large [node_refs] sections (or pagination metadata), this underestimates the baseline size and can hide real compression savings in observability metrics. Use the same full-response text shape used by the non-delta return when calculating the original size.

Useful? React with 👍 / 👎.

Comment thread src/mcp-server.ts
metrics.observe(
'openchrome_tool_compression_saved_bytes',
withTenantLabel({ tool: toolName, mode: level }),
originalChars - compressedChars,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Measure compression savings in bytes before reporting

This metric is named openchrome_tool_compression_saved_bytes, but it records originalChars - compressedChars from _compression. Since these values are currently based on string .length (UTF-16 code units), responses containing multi-byte UTF-8 characters will report incorrect byte savings. Convert both values to UTF-8 byte lengths (or rename the metric) so dashboards reflect actual payload reduction.

Useful? React with 👍 / 👎.

Merge the latest develop crawl-tool additions into PR #1063 and apply the same pipe-safe schema lint handling now required by the larger tool manifest.

Constraint: develop advanced after the first push and the manifest now exceeds the old pipe-safe size envelope on this branch too.
Rejected: Leaving the branch green only by current byte size | subsequent develop merges can cross the same threshold again.
Confidence: high
Scope-risk: moderate
Directive: Introspection commands must flush stdout and schema lint must stream stdin for large manifests.
Tested: npm run build; npm run lint:tool-schemas; npx jest tests/cli/admin-keys.test.ts --runInBand --forceExit; git diff --check
Not-tested: Full multi-OS GitHub Actions matrix before push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@shaun0927
Copy link
Copy Markdown
Owner Author

@codex review

develop 최신 변경(crawl_start/status/cancel)까지 추가 병합하고, 커진 manifest에서도 lint:tool-schemas가 안정적으로 동작하도록 pipe-safe introspection/schema-lint 수정을 이 PR에도 반영했습니다.

추가 검증:

  • npm run build
  • npm run lint:tool-schemas
  • npx jest tests/cli/admin-keys.test.ts --runInBand --forceExit
  • git diff --check

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 71054b8664

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/read-page.ts
Comment on lines +607 to +608
originalChars: outputText.length,
compressedChars: compressedText.length,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Compute delta compression savings from comparable payloads

The _compression metadata for read_page delta responses currently compares originalChars from outputText.length against compressedChars from compressedText.length, but compressedText includes nodeRefsBlock/pagination sections while originalChars does not. In calls where those shared sections are non-trivial, this undercounts (or can invert) savings, which then suppresses or distorts openchrome_tool_compression_saved_bytes in recordToolOutputObservability. This makes the new observability metrics inaccurate specifically for DOM delta mode responses with larger node-ref or pagination blocks.

Useful? React with 👍 / 👎.

Make the HTTP bearer auth regression allocate its port range from the Jest process id before using per-case ports, avoiding collisions across parallel worktrees and CI shards.

Constraint: PR #1063 CI failed in the HTTP auth suite with socket hangups despite the targeted admin-key fix being green.
Rejected: Ignoring unrelated full-suite transport flakes | merge readiness requires the matrix to stay green.
Confidence: high
Scope-risk: narrow
Directive: Keep transport tests independent of globally fixed loopback ports.
Tested: npx jest tests/transports/http-bearer-auth.test.ts --runInBand --forceExit
Not-tested: Full multi-OS GitHub Actions matrix before push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@shaun0927 shaun0927 merged commit ecc27f6 into develop May 13, 2026
9 checks passed
@shaun0927
Copy link
Copy Markdown
Owner Author

@codex review

HTTP bearer auth 테스트 고정 포트 충돌을 process별 포트 range로 보강한 최신 커밋까지 리뷰 부탁드립니다.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@shaun0927
Copy link
Copy Markdown
Owner Author

@codex review\n\nAddressed current P1 observability feedback: output/token/compression/cache metrics now bucket unregistered tool names as unknown before emission, closing the denied-call label-cardinality vector while preserving registered tool labels.\n\n검증:\n- npm run build\n- npx jest tests/metrics/tool-output-observability.test.ts --runInBand --forceExit\n- npm run lint:tool-schemas\n- git diff --check

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecc27f6abf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mcp-server.ts
Comment on lines +2063 to +2064
metrics.observe('openchrome_tool_output_bytes', withTenantLabel({ tool: toolName }), bytes);
metrics.observe('openchrome_tool_estimated_tokens', withTenantLabel({ tool: toolName }), estimateOutputTokensFromChars(payload.length));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Bucket unregistered tool labels before emitting metrics

recordToolOutputObservability emits tool labels using raw toolName, but this function is now called on tenant/scope-denied paths before tool lookup (src/mcp-server.ts around lines 1295-1342), where toolName comes directly from request input. That lets callers generate unbounded time-series cardinality by sending many distinct bogus tool names, which can exhaust metrics memory and scrape performance; unregistered names should be normalized to a bounded value (e.g., unknown) before emitting openchrome_tool_output_bytes, openchrome_tool_estimated_tokens, openchrome_tool_compression_saved_bytes, and openchrome_cache_status_total.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant